Approximate N-Gram Markov Model for Natural Language Generation

نویسندگان

  • Hsin-Hsi Chen
  • Yue-Shi Lee
چکیده

This paper proposes an Approximate n-gram Markov Model for bag generation. Directed word association pairs with distances are used to approximate (n-1)-gram and n-gram training tables. This model has parameters of word association model, and merits of both word association model and Markov Model. The training knowledge for bag generation can be also applied to lexical selection in machine translation design.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Learning on an Approximate Corpus

Unsupervised learning techniques can take advantage of large amounts of unannotated text, but the largest text corpus (the Web) is not easy to use in its full form. Instead, we have statistics about this corpus in the form of n-gram counts (Brants and Franz, 2006). While n-gram counts do not directly provide sentences, a distribution over sentences can be estimated from them in the same way tha...

متن کامل

Learning Representations for Weakly Supervised Natural Language Processing Tasks

Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This article investigates novel techniques for extracting features from n-gram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field model. Experiments on partof-speech tagging and...

متن کامل

Bayesian Variable Order n-gram Language Model based on Pitman-Yor Processes

This paper proposes a variable order n-gram language model by extending a recently proposed model based on the hierarchical Pitman-Yor processes. Introducing a stochastic process on an infinite depth suffix tree, we can infer the hidden n-gram context from which each word originated. Experiments on standard large corpora showed validity and efficiency of the proposed model. Our architecture is ...

متن کامل

Estimating Comma Placement in Natural Language

We study the feasibility of identifying comma locations using both n-gram models and stochastic contextfree grammars (SCFGs). Specifically, our algorithms take an input sentence without commas and returns the positions where commas should be inserted, along with probability or confidence estimates. This can be generalized to correcting comma placement with minor modifications. However, we focus...

متن کامل

Intra-sentence Punctuation Insertion in Natural Language Generation

We describe a punctuation insertion model used in the sentence realization module of a natural language generation system for English and German. The model is based on a decision tree classifier that uses linguistically sophisticated features. The classifier outperforms a word n-gram model trained on the same data.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/cmp-lg/9408012  شماره 

صفحات  -

تاریخ انتشار 1994